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ABSTRACT 


Intelligence-production activities are typically viewed as part of an intelligence cycle, con¬ 
sisting of planning, collection, processing, analysis, and dissemination stages. Once a 
request for information is issued, the intelligence agencies mostly deal with the collection 
and processing activities of the cycle. However, in most situations, there is an enormous 
amount of data to be collected. This overabundance of information requires methods that 
select only the useful data, to prevent intelligence personnel from wasting time and effort 
on non-relevant data. Online learning is an area of research that has gained attention in 
recent years with applications in areas such as web advertising, classification, and decision 
making. In this thesis, we develop a model aimed at the collection and processing phases 
of the intelligence cycle, applicable in situations where the data is obtained sequentially, so 
that learning algorithms are realistic. We analyze the performance of a modified Thomp¬ 
son Sampling algorithm, to help intelligence analysts make good decisions, regarding the 
sources from which to collect/process as well as the collection/processing capacity and 
its allocation over time, in order to bind the risk of missing valuable information below a 
certain threshold. 
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Executive Summary 


Intelligence activities are part of a framework called intelligence cycle. The main phases in 
the cycle are planning, collection, processing, analyzing, and dissemination of information. 
Some of the data collected from the operational environment is discarded due to irrelevance. 
This discarded data wastes time and effort for the intelligence organizations. 

As demonstrated in [1], “The goal of online learning is to make a sequence of accurate 
predictions given knowledge of the correct answer to previous prediction tasks and possibly 
additional available information.” Online learning is used when learning with training data 
is infeasible, or the data is non-stationary. It is also used to adapt to the changes in the 
environment. Learning as one goes along can be more robust than specifying a model and 
using mathematical optimization [2]. Although it was first defined in machine-learning 
literature, its application can be found in other areas such as optimization, game theory, 
and statistical modeling. 

The main goal of the thesis is to address the challenge of how to efficiently explore the data 
originating from a large number of sources. For this purpose, we propose a model that is 
suitable for most of the situations in the collection and processing phases. In particular, we 
modify the well-known Thompson Sampling (TS) algorithm, originating in the machine- 
learning community, to sample from more than one source at a time, as is the case in 
intelligence organizations that analyze large numbers of items concurrently. 

We address the following questions: 


• How can we create a balance between exploration and exploitation to maximize the 
benefit when making decisions as to the collection, processing, or analysis of data if 
there are time or resource constraints? 

• How can we decide what amount of resources should be allocated for collection, 
processing, or analyzing if we can not change the allocation in the short term? 

• How can we adjust the allocation dynamically while learning occurs online? 

• How can we quantify the risk of missing relevant data versus the amount of resources 
allocated? 
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• How can we use the data gathered to gain insights about the population of the sources 
that generated the data? 


We modified the TS algorithm, so we can explore arbitrarily large number of sources, 
limited only by the resources of the intelligence organization. The measure of performance 
is the regret: the difference between the rewards obtained by the algorithm and the rewards 
that could have been obtained had the analyst known the true nature of each source of 
intelligence. The expected regret has recently been shown to grow sublinearly in number 
of time periods (T), so that learning does indeed occur. TS, in its base form, leads to 
learning on the order of O(logr), meaning that the expected average regret goes to zero as 
the time horizon T increases. 

Our main conclusions and contributions can be summarized as follows: 


• The model described can be used to allocate the collection/processing resources/ef¬ 
forts efficiently. 

• The suggested algorithm employed in the model yields a sublinear performance in 
the simulations we conducted, meaning that the average regret tends to zero as the 
number of time periods gets larger. 

• The model can be adapted to situations in which prior knowledge about the sources 
exists. 

• We consider the capacity allocated as possibly changing over time, as information 
becomes available. With this approach, intelligence agencies can better control the 
regret in the exploration phase and avoid using excess capacity. 

• The model can also be employed to gain insights about the risk of missing relevant 
data, which provide further guidance for the capacity required. 

• We also use the Expectation Maximization (EM) algorithm to estimate the distribu¬ 
tional parameters for the statistical model of the candidate subpopulations as data is 
collected. 
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CHAPTER 1: 

Introduction 


Knowledge is power only if man knows what facts not to bother with. 

Robert Staughton Lynd [1] 


1.1 Introduction 

The amount of data available to intelligence agencies has skyrocketed in recent years in 
parallel to technological advances. Fully handling this huge amount of data is beyond the 
capabilities (human and technological) of any organization if done in a naive manner. 

The challenge is well summarized by Hedley: 

In the twenty-first century, a principal analytic challenge lies in the sheer vol¬ 
ume of information available. Although especially hard targets such as terrorist 
cells are no less difficult to penetrate, the explosion of open-source informa¬ 
tion from news services and the Worldwide Web makes the speed and volume 
of reporting more difficult to sift through. Advances in information technol¬ 
ogy both help and hinder, as analysts strive to cope with the “noise,” the chaff 
they must winnow away. Data multiply with dizzying speed. Whereas col¬ 
lecting solid intelligence information was the overriding problem of the past, 
selecting and validating it loom ever larger as problems for analysts today. [2] 

Intelligence activities are commonly considered within a framework called the Intelligence 
Cycle. The main phases in the cycle are planning, collection, processing, analyzing, and 
dissemination of the information. The detailed discussion of these stages is provided in 
Chapter 2. 

Some of the data collected from the operational environment is discarded because it is 
irrelevant by the point when final judgments are made by analysts. According to Wirtz [3], 
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“Observers ritualistieally point out that analysts are constantly at risk of being overwhelmed 
by a deluge of information from both open and classified sources. Yet, the real danger may 
be the fact that, within this data stream, there is little valuable information about the highest 
priority targets and issues facing analysts.” 

Besides the relevancy of the data, the quality of the source that generates the data has to be 
understood in order to focus on the sources that are more likely to produce relevant data. 

Online learning is characterized by updating the beliefs about the ground truth, which is 
unknown, as new data becomes available. Because the uncertainty is reduced as a result of 
explorations, one ought to adapt his decisions in light of new information. This approach is 
also applicable for most of the situations wherein decisions are made in sequence. In some 
collection contexts, data is collected in a sequential manner that allows analysts to apply 
online learning methods to assess the quality of the sources. For example, the sources could 
be a number of tracked Twitter accounts. The tweets that are collected result in the costs 
of time and effort. The data (i.e., messages) from these sources (i.e., Twitter accounts) 
become available sequentially. It is straightforward to come up with other similar scenarios 
for which online learning methods are germane. In summary, gaining information about 
the quality of the sources allows the analyst to select a subset of them, given the capacity 
and time constraints. 


1.2 Research Questions and Methodology 

In this study, we adopt a model that allows us to benefit from the ideas developed within the 
online learning community. We adapt the well-known Thompson Sampling (TS) algorithm 
to a case in which many samples can be explored at a time. We address the following 
questions (see Chapters 3 and 4 for more details): 

• How can we create a balance between exploration and exploitation to maximize the 
benefit when making decisions as to the collection, processing, or analysis of data if 
there are time or resource constraints? 

• How can we decide what amount of resources should be allocated for collection, 
processing, or analyzing if we can not change the allocation in the short term? 

• How can we adjust the allocation dynamically while learning occurs online? 
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• How can we quantify the risk of missing relevant data vs. the amount of resourees 
alloeated? 

• How ean we use the data gathered to gain insights about the population of the sourees 
that generated the data? 

Initially, we develop a modeling framework that is suitable to answer the aforementioned 
questions and then analyze its performanee with numerieal examples. 

1.3 Scope 

We intentionally define the model to be analyzed in a generie fashion. The sourees in the 
model eould be diserete portions of a wide geographieal area. Items from these sourees 
are colleeted with a UAV, processed using speeific algorithms at headquarters, and further 
analyzed by a human analyst. To keep the model as simple as possible, in this work, we do 
not consider speeifie issues for eaeh setting (e.g., UAV travel time between non-adjaeent 
geographie loeations). 

In Table 1.1, we inelude several other illustrative examples. 


Table 1.1: Examples for Sources and Capacities (Resources) for Different Phases 


Phase 

Source (Where to sample) 

Resource 

Collection 

A geographical area, 

A frequency band. 

An edge of a social network 

A satellite. 

Signal interceptor 

Processing 

Data aggregated from the collection phase 

Decryption tool. 
Automatic translator 

Analysis 

Processed data. 

Translated, decrypted message. 
Restructured data 

Human analysts 
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1.4 Structure of the Thesis 

This thesis consists of five chapters. Chapter 2 is dedicated to a background on intelligence 
activities as well as the literature review. Chapter 3 introduces the modeling framework, 
including the assumptions and the proposed approach. In Chapter 4, we analyze the model 
from Chapter 3 to answer the research questions. Also, Chapter 4 contains numerical results 
obtained from the simulations. In Chapter 5, we offer conclusions. 
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CHAPTER 2: 

Background and Literature Review 


In Chapter 2, we provide information about intelligence and online learning. Then, we 
discuss to what extent the activities performed in the intelligence cycle are related to, and 
can be modeled by, online learning. Finally, we look at the related studies on the overlap. 


2.1 Intelligence 

Intelligence is a means to an end [4]. From a state perspective, the most important goal 
of intelligence is to provide security to the people. Almost all states have dedicated in¬ 
telligence agencies. These agencies have similar structures and procedures. Their typical 
missions are to collect relevant information and to conduct objective analyses. One of 
the key challenges is to leverage technological advances for better performance in agency 
missions [5]. 

2.1.1 Definition and Categories of Intelligence 

Intelligence is an elusive term, so we need to clarify its meaning. A broad definition may 
be the information that has been collected, processed, and analyzed for the use of deci¬ 
sion/policy makers. Besides the final product, the term intelligence is also used to refer to 
the process through which it is produced, the organization that produces it, or the whole 
Intelligence Community [6]. Formal definitions of intelligence from the Department of 
Defense Dictionary of Military and Associated Terms [7] are as follows: 


The product resulting from the collection, processing, integration, evaluation, 
analysis, and interpretation of available information concerning foreign na¬ 
tions, hostile or potentially hostile forces or elements, or areas of actual or 
potential operations. 

The activities that result in the product. 

The organizations engaged in such activities. 
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2.1.2 Intelligence Cycle 

Although intelligence officers admit that effective intelligence efforts are not cyclic [8], 
production and consumption of the intelligence is traditionally considered a cyclic process, 
as shown in Figure 2.1. The main steps in the cycle include identifying requirements/needs, 
planning and direction, collection, processing, analysis and production, and dissemination 
[ 8 ]. 

The steps or categories of the cycle represent the related activities conducted by the agen¬ 
cies. Activities in each category may happen concurrently, or some steps may be bypassed. 
For example, a requirement may be addressed by analyzing existing data without any col¬ 
lection effort. 

Briefly, the cycle can be explained as follows: 

• Consumers determine the requirements and the priorities. 

• Agencies plan all the efforts necessary through the process until the delivery of the 
final product to the consumer. 

• Data is collected via intelligence gathering disciplines. 

• Huge amounts of data are processed and converted into a form usable by the analysts. 

• Analysts determine the relevancy and the importance of the data. 

• The intelligence is disseminated to the consumer who demanded it. 
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Figure 2.1: Intelligence Cycle. 


Raw data is filtered out depending on its relevancy (importance) as it goes through the 
phases of the intelligence cycle, and it takes on different names along the way: data, infor¬ 
mation, and intelligence. The relationship between data, information, and intelligence [9] 
is shown in Figure 2.2. The operational environment harbors all the information we collect, 
but we generally collect only a fraction of it. Because we do not know the exact impor¬ 
tance of data before it goes through the entire cycle, the decision regarding where to collect 
the data is itself a filtration. After the data is collected, it is (pre)processed and, therefore, 
subject to another filtration. Finally, analysts examine the information and possibly discard 
some portion of the information as irrelevant. Throughout the collection phase, it is impor¬ 
tant to collect the least amount of data needed so as to increase efficiency. This efficiency 
requirement drives our motivation to apply online learning ideas when appropriate. 
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Relationship of Data, Information, and Intelligence 



Figure 2.2: Relationship between Data, Information and Intelligence. Adapted from [9]. 


2.1.3 Intelligence Collection 

In the collection phase, the data is gathered from diverse domains with many techniques 
or assets, varying from human to very sophisticated instruments. These differ according to 
the following intelligence-gathering disciplines: 

• Human Intelligence (HUMINT): humans on the ground 

• Geospatial Intelligence (GEOINT): satellite, aerial photography, mapping/terrain 
data 

• Measurement and Signature Intelligence (MASINT): different types of sensors 

• Open Source Intelligence (OSINT): from all open sources 

• Signals Intelligence (SIGINT): intercepting the signals 

• Technical Intelligence (TECHINT): analysis of technical information of the weapons 
and equipment used the by foreign nations 
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• Cyber Intelligence/Digital Network Intelligence (CYBINT/DNINT): cyber space 

• Financial Intelligence (FININT): analysis of monetary transactions 

Parallel to the advancement of technology, the capabilities of the collection assets, espe¬ 
cially technical ones, constantly improve. However, the required resolution of the informa¬ 
tion collected still prevents the agencies from collecting from all possible sources, e.g., ge¬ 
ographical areas, spectrum, Internet traffic. As for an intelligence satellite, if its movement 
is not synchronized with the earth’s movement (geostationary), it can only look at a portion 
of the area for a limited time period. On the other hand, collection or processing data from 
a source may be costly due to encryption, deception, or other denial techniques [4]. For ex¬ 
ample, people may exchange encrypted messages within a social communication network. 
Here, allocating collection efforts to the candidate sources has to be done wisely to maxi¬ 
mize the value obtained. These typical situations pose an exploration/exploitation dilemma 
similar to that of a Multi-Armed Bandit (MAB) problem, as discussed in Section 2.2. 


2.1.4 Intelligence Processing 

In the collection phase, a large amount of data (especially from SIGINT) is obtained that 
requires processing before being delivered to an analyst. Processing may include organiz¬ 
ing, structuring, or translating the data. Given the constraints on the processing efforts and 
the large volume of data complete analysis may be infeasible. Even time-critical data may 
be left untouched until a processing effort is allocated. 


2.1.5 Intelligence Analysis 

The flood of information may keep analysts busy just reading the incoming information 
without producing any intelligence [10]. Processed intelligence is first put before the ana¬ 
lysts. Daily, they look at the information coming from different sources. Naturally, analyt¬ 
ical capacities often lag behind the collection and processing capacities [4]. Given the time 
constraint and difficulty of analyzing all the information, the exploration and exploitation 
of the relevant sources is also viable for the analyzing phase. 
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2.2 Online Learning 

Section 2.2 defines online learning and examines how it can be applied to intelligence. 


2.2.1 Overview 

Although it was first defined in machine-learning literature, online learning can be applied 
to other areas such as optimization, game theory, and statistical modeling. 

According to [11], “The goal of online learning is to make a sequence of accurate pre¬ 
dictions given knowledge of the correct answer to previous prediction tasks and possibly 
additional available information.” It is used when learning with a training data is infeasible 
or the data is non-stationary. Online learning is also used to adapt to the changes in the 
environment. Learning as one goes along can be more robust than specifying a model and 
using mathematical optimization [12]. 


2.2.2 Multi-Armed Bandit Problem 

The MAB problem is one particular setting for online learning. In a MAB problem, an 
agent chooses one of K machines (bandits) to play in each game (iteration). In order to 
maximize his gain, the agent has to allocate his money wisely between exploring the good 
bandits and exploiting the information learned. 

The problem is important when the decision maker has a budget constraint that prevents 
him from learning the truth about each alternative before making a decision. Important 
applications of the bandit model: 

• Clinical trials investigating the effects of different experimental treatments while 
minimizing patient losses [13] 

• Adaptive routing efforts for minimizing delays in a network [14] 

• Financial portfolio design [15] 

• Resource allocation to various projects given uncertainty about the difficulty and 
profit of each possibility [16] 
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2.2.3 Exponential-weight Algorithm for Exploration and Exploitation 

Exponential-weight Algorithm for Exploration and Exploitation (Exp3) [17] can be used to 
approximately solve the MAB problem as previously described. The performance of Exp3 
is measured by weak regret, the difference between the total rewards accumulated by the 
best machine and the sum of the obtained rewards throughout the game history [17]. 


Eigure 2.3 shows the pseudo-code of the algorithm [17]. An obvious downside of the 
algorithm is that the parameter y, which represents the uniformly allocated part of the 
probability to the machines, needs to be provided in advance. As proved by Auer et al. [17], 


optimal 7 is calculated as follows: y= min 


1 / KlnK 


Here, parameter g is the upper 


bound of the total weak regret after a prospected time horizon T [17]. When rewards are at 


the interval [0,1], maximum regret cannot be greater than T. Therefore, we can replace g 


with T in the equation. 


Parameters: real y 

nitialization: w,(l) = 1 for z = ..,K 

^or each t = 1,2,... 

• sapiit) = (1 - y) + I for ^' = 1. • • • 

• draw it randomly accordingly to the probabilities p\{t),... ,pK{t) 

• receive reward Xi,{t) G [0,1] 

• for 7 = 1,..., .fir set 

Xj{t) = if j = it otherwise 0 

Wj{t -b 1) = Wj{t)exp 

Figure 2.3: Pseudocode of Exp3 Algorithm. Adapted from [17]. 
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In other words, decisions are selected randomly from a probability mass function of bandits, 
which is updated at each iteration according to the reward obtained. One salient character¬ 
istic of the algorithm is that a portion of the probability is allocated uniformly to all of the 
bandits so as to keep exploring the bandits. 


2.2.4 Thompson Sampling 

TS, as demonstrated in [18], is a heuristic method that can be utilized to address the explo¬ 
ration and exploitation tradeoff posed by MAB. The idea behind TS is to choose the action 
(choice of a machine to play) that has the largest expected reward according to the posterior 
reward distributions of the actions. Since the expectations can not be analytically computed 
and it is easier to sample from a posterior distribution, actions are drawn randomly from 
the corresponding posterior distributions in each time period; then, the belief distributions 
are updated using past observations (Bayesian update). 

TS is described in [19] as follows: 

Consider a set of actions , and rewards in M . In each round, the player chooses an action 
aG £/ and obtains a reward r G M following a distribution that depends on the issued action. 
The aim of the player is to play actions such as to maximize the cumulative rewards. 

The following are elements of Thompson sampling: 

• A likelihood function P(r10, a); 

• A set 0 of parameters 6 of the distribution of r; 

• A prior distribution P{0) on these parameters; 

• Past observations S’ — {{a;r)}; 

• A posterior distribution P(01^) P{S\0)P{0). 

TS consists in playing the action a* E sS according to the probability that it maximizes the 
expected reward, i.e., /I[E(r|a, 0) = maXfl/E(r|a', 6)]P{6\S)d6. In practice, the rule is 
implemented by sampling, in each time period, a parameter 6* from the posterior P{6\S) 
and choosing the action a* that maximizes E[r|0*,a*], i.e., the expected reward given the 
parameter and the action [20]. In words, the player selects his beliefs randomly from 
posteriors, and then acts optimally according to them [20]. 
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TS was not common in literature until reeently. Chapelle and Lee [19] present “empiri- 
eal results using TS on simulated and real data, and show that it is highly eompetitive,” 
eompared to other known algorithms, e.g.. Upper Confidenee Bound (UCB), and robust to 
observation delays. Furthermore, TS is easy to implement and takes no parameter that has 
to be determined in advanee, unlike Exp3. 


2.3 Intelligence and Online Learning 

As diseussed in Seetion 1 and demonstrated in Figure 2.2, the data moves through until the 
ultimate eonsumption by the eonsumer and is subjeet to filtration. Not all of the eolleeted 
data will eventually beeome full-fledged intelligenee. It may be disearded as irrelevant or 
insignifieant after being eolleeted, proeessed, or analyzed. It may not even be eolleeted in 
the first plaee beeause of its presenee in an irrelevant domain. 

The amount or number of items that are disearded as irrelevant in any phase of the intelli¬ 
genee eyele ean be an indieator of the effleieney of the proeess. In other words, we need 
to colleet, proeess, and analyze the least amount of data neeessary to produee a certain 
intelligenee without wasting any effort. This ean be aehieved by eolleeting the data from 
the domain in an adaptive manner. The potential of the domain’s parts ean be learned as 
the data are eolleeted and fed into the eyele, whieh in turn ean redireet the eolleetion asset 
to more promising parts of the domain. The same logie is applieable for the proeessing and 
analyzing phases. 

Online learning prineiples offer promising guidanee regarding how this filtration ean be 
done. The inherent exploration and exploitation dilemmas that exist in all eolleetion, pro- 
eessing, and analyzing efforts are eondueive to approaehing the filtration as an online learn¬ 
ing setting sueh as MAB. 

Different settings exposed by eolleetion, proeessing, and analyzing efforts, or even by dif¬ 
ferent teehniques employed in eaeh effort, may lead to different assumptions regarding the 
dependenee of the sourees (both temporal and spatial). Optimal learning approaeh ean 
be modeled using modifieations of existing learning algorithms/heuristies, sueh as Exp3 
and TS in pursuit of effieient solutions resulting in the maximum gain with the available 
eolleeting, proeessing, or analyzing eapaeity. 
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2.4 Related Studies 

To the best of our knowledge, there are three Naval Postgraduate School (NPS) master’s 
theses that touch upon efficient intelligence collection and processing. 


Costica [21] considers the bottleneck or congestion caused by a huge information flow 
and proposes a tandem queue model for a preliminary classification of intelligence items 
regarding their relevance to an intelligence request. 

Nevo [22] specifies a network model for social network communication and treats the edges 
as the sources. To maximize the relevant data discovered, he compares the performances 
of several learning algorithms under the time constraints. 

Ellis [23] analyzes the performance of some learning algorithms in detecting relevant con¬ 
versations from an intercepted social communication network, as represented by the model 
developed by Nevo. 
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CHAPTER 3: 
Model 


In this chapter, we set the framework for our model and parameter estimation approach that 
addresses the efficient data collection. 


3.1 Setting 

In order to keep the developments as generic as possible, we define several terms. As de¬ 
scribed in Chapter 2, intelligence cycle consists of the collection, processing, and analysis 
stages. For each of these stages, items originate from different sources. Items that are 
examined require a certain amount of resources (capacity), depending on the cycle stage. 

For instance, the source could be a certain geographical area. Items from an area are 
collected with a UAV, processed using specific algorithms at headquarters, and further an¬ 
alyzed by a human analyst. To keep the model as simple as possible, in this work, we do 
not consider specific issues for each setting (e.g., UAV travel time between non-adjacent 
geographic locations). In the Table 3.1, we include several other illustrative examples. 


Table 3.1: Examples for Sources and Capacities (Resources) for Different Phases 


Phase 

Source (Where to sample) 

Resource 

Collection 

A geographical area, 

A frequency band. 

An edge of a social network 

A satellite. 

Signal interceptor 

Processing 

Data aggregated from the collection phase 

Decryption tool. 
Automatic translator 

Analysis 

Processed data. 

Translated, decrypted message. 
Restructured data 

Human analysts 
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3.2 Model 


There are S sources assumed to be independent. In each time period t, source 5 G {1,..., 5} 
generates a relevant item with probability ps, assumed constant in time, independently of 
past observations. Only q < S (but often times, q S) sources can be explored each time 
period. For simplicity, we assume that there are no misjudgments, meaning that there are 
no classification mistakes by the resource employed to examine the item (see Table 3.1). 
Exploring a relevant item yields a reward of 1, while non-relevant items yield no reward, or 
a reward of 0. We also assume that the cost of exploring an item from a source is fixed and 
equal for all sources and is, thus, not explicitly considered in the model. The measure of 
performance is the total expected reward over some finite horizon T. Hence, if the values 
of Ps for all sources s were known, then simply exploring the q sources with the largest 
probability Ps of yielding a relevant item would maximize the expected reward. 

However, in most realistic situations, the values of ps are unknown because there is little 
available information from the source. In this work, we assume that existing and past 
information can be subsumed into a prior distribution for ps for each source s. On the 
one extreme, if the analyst has a very high degree of certainty on the value of ps for some 
source s, then he would put a prior density centered at some value at (0,1), with most of 
the mass concentrated around that value. If, on the other hand, the analyst knows nothing 
about Ps, then he would assume a uniform prior. In other words, we model the probabilities 
of yielding a relevant item for each source, ,..., p 5 , as themselves being randomly drawn 
from some distribution that may depend on the source s. 

The beta distribution, with Probability Density Function (pdf) 

f{p;as,ps)oc p^^ {I-pP^)^ 

is proportional to the likelihood of p given ct/ successes and /3,- failures. The beta distribu¬ 
tion is appealing because it is conjugate with the Bernoulli distribution, which is associated 
with {0,1} reward situations such as ours. The beta distribution is continuous over 0 to 1, 
with mean (ct^ -|-/3^) and variance asPs/[i(Xs + -1-/3^ -I-1)]. Under the assumption 

that Ps ~ Beta(aj, jS^), the analyst may use historical data to estimate the parameters as and 
^s- If there is no historical data, choosing = /3j = 1 is akin to assuming a uniform prior. 
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A source that with high certainty yields non-relevant items would have 1 while 

one that with low certainty generates relevant items would have 1 > a, > j8, > 0. 

Source gets explored from 

Bernoulli (pi), 

and the posterior distribution for ps becomes Beta(aj+v:j,j8^ + 1 — a:^), on G {0,1}. 

Hence, after Us^t explorations of items generated by source s by period t, we have 


Ps\^s,li ■ ■ ■ T^s,ns.t 


~ Beta(ttj.+ y5f f 


5 


where ys^i = is the number of relevant items generated by source s by time t. 

As source s is explored, the analyst gains a degree of certainty about its probability of 
generating relevant items, since its variance, equal to 

(otj + ys,t){l^s + f^s,! ~ ys,t) _ / N 

(Ofj+ + + +1) 

decays to zero at a rate of order one over the number of explorations at that source. 

Each time period the analyst has to decide which sources to explore, in a way that allows 
her or him to balance the sources with high uncertainty about ps (i.e., + Us^t small) 

against those that are likely to yield relevant items (i.e., those with +yj ,f 3> (3^ + ris^t — 
ys,t)- Intuitively, the analyst should choose to explore the items from the q sources with the 
largest chance of generating relevant items. 

Recently, it has been shown that an approach known as Thompson Sampling [24] (referred 
to as TS throughout this work) achieves the best learning rate for this situation from a 
theoretical standpoint, as indicated by [25]. Our model is very similar to TS, with one 
main difference: whereas in TS, one can only sample a single source at a time, here, we 
can sample q sources per time period. TS examines items from sources with the greatest 
probability of generating a relevant item. This is done in two steps, first, by drawing a 
random p from the posterior distribution of each source and, second, by exploring items 
with the largest values of p drawn in the first stage. We summarize these steps as follows: 
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1. Draw a sample ps from each source from a Beta(a +ys,t,l^ — Js,?)- and sort 
the values in increasing order P{i):P{2)t ■ ■ :P{s)- 

2. From sources corresponding to p^s-q+i):- ■ draw a sample from Bernoulli(p^). 


Agrawal and Goyal [24] have shown, for q=\, the expected regret 

T 

£[Regret(r)] = J^£[max{p,} 

t ^ 

where p^q^ is the source sampled at time t, grows like 

1 


£’[Regret(r)] = O 


s 

E 


,i=i 


(max,{p,}-p/) 


A2 


logr 


(3.1) 


where s* = argmax^p^, a function f{T) = 0(\ogT) if |/(r)| < klogT for all T > 0 and 
some 0 < k < o°. In words, the expected difference between the total reward gained by 
exploring the best source if we knew all the reward probabilities ps in advance, and the 
total reward obtained by the previous algorithm (that is, the expected regret), grows at 
order 0(51ogr). This implies that the average regret goes to zero, meaning that learning 
occurs. Lai and Robbins [25] show that this learning rate is optimal, in the sense that it is 
not possible to have the regret grow slower than O(logr). Notably, the dominating term 
driving the 0{-) rate of growth in the regret is one divided by the square of the smallest 
difference between the best and second best sources. This is because the algorithm takes a 
long time to find the best source when its probability of yielding a relevant item is similar 
to that of the second best source. 


We view ^ as a decision variable for the intelligence organization. In this setting, the regret 
at time t is the (random) difference in reward obtained by drawing a sample from each 
of the best q sources and the reward attained by exploring sources pi^t,---,Pq,t, where 
Pi,tT-- :Pq,i are the probability of yielding a relevant item for sources selected to explore 
at time t. 
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Thus, the expected regret is 


£[Regret(r)]=££ 


S q 

s=5—q’+l i=l 


wherep(i) < p^ 2 ) <---<P{sy 

Because the intelligence organization typically pools its resources, the value of q (i.e., a 
measure of the resources devoted to the request for information under consideration) can 
change over time, but its average value is upper bounded. This relaxation motivates the 
following question. How should q change over time as subject to a bound on its mean 
value? What is the associated risk with any given ql How should we adjust q in future time 
periods? These questions are the subject of Chapter 4. 


3.3 Parameter Estimation after the Data Is Available 

In some cases, it makes sense for the analyst to assume that the items from a source come 
from a mixture of distributions, each distribution corresponding to a particular component 
(e.g., age group). More precisely, the analyst assumes that ps has a mixture distribution, 

i=\ 

where n is the number of components, w/ and /)(■) are weight and density functions of zth 
component, respectively, with the constraint Y^iWi = 1. 

This is often complicated by the fact that the source-class may not be observable; that is, 
the analyst has a collection of zeros and ones from a source, but does not know the com¬ 
ponent of each item. To handle this scenario, we use the Expectation Maximization (EM) 
algorithm. EM can be used to estimate the weights and the parameters of the compound 
distributions when the information to which compound an observation belongs to is miss¬ 
ing [26]. Employing EM we can do the following: 

• Estimate the sizes of each component (weights) 

• Estimate the parameters of each component 
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• Estimate the component of each observation 


EM iteratively repeats in two steps: expectation and maximization. In the first step, con¬ 
ditional expectation is calculated. In the second step, the parameters that maximize the 
expectation are found [27]. It has been proved that the estimates converge to the true 
parameters [27]. Eor this purpose, we use the betareg package of R, which has been devel¬ 
oped for beta mixture models. Eor the details of implementation for beta mixtures within 
the package, see Grun et al. [28]. 

Eor a formal explanation of EM, let X be the vector of observations from the mixture 
distribution, and let Z be the vector indicating the compounds that are unknown (hidden). 
9t is the vector of parameters to be estimated in iteration t. Then: 

Expectation Step: Determine the conditional expectation £z|X.9,W.2|e)) 

Maximization Step: Eind the 9 that maximizes this expectation 

In our case, the densities /,(■) are beta with parameters a,- and j8/. Despite the fact that we 
show the results for beta mixtures, which is more convenient for our purpose, EM for other 
mixture models can be implemented, or available packages can be employed. 

Expectation Step: Calculate 


— E{logPi\Xi — Xi) — ^(^CCqIcI -|- Xi) ^(^OCqI^ -f- j^old T bl'jj 


Si = E{log{l - Pi) \Xi = Xi) = 'E {l5oid +N- Xi) - 'E( tto/d -t- ^old + N) , 
for i= and N is the number of trials. 

Karlis [29] proposes a scheme to update the current estimates of the parameters of Beta 
distribution as follows: 
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Maximization Step: Make an one-step ahead Newton Raphson iteration for ML estimation 
of a beta density using the expectations of the E-step. To do so, calculate 

n 

- Li = 1 ”^/ 

^ =-, 

n 

and, then, update the estimates as 


CCnp.w — 


^{.^old) ^{,^old T Pold) t 

^3iCCold)-^3i<Xold+Pold) 


o _ D ^il^old) ^{CCnew Pold) ^ 

Pne. - Paid " + Paid) ' 


'E(') denotes the digamma function: 

d rTjtl 

'EM = ^IniTix)) = 

dx 1 (.rj 


'E 3 (-) denotes the trigamma function: 

gamma function for positive real numbers: 

poo 

r(.r) = / 

Jo 
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CHAPTER 4: 
Analysis 


As our main contribution in this thesis to the ongoing effort to collect, process, and analyze 
data for intelligence, we extend the TS framework, so more than one item can be explored 
each time period. As shown in Chapter 3, the capacity <7 is a decision variable for the 
intelligence organization. While there is no variable cost for exploring a source, there is a 
limit to how large q can be in each time period or on average. The rationale for this is that 
the resources of the intelligence organization (e.g., technological or human) are viewed as 
a fixed cost, to be used at will. 

Exploring more than one source per time period triggers a change in the interpretation of 
expected regret. In our case, we calculate expected regret as follows: 

E[Regret(r)] = 

t 5=5—^+1 i=l 

as shown in Chapter 3. In this setting, the regret at time t is the (random) difference in 
reward obtained by drawing a sample from each of the best q (unknown) sources and the 
reward attained by exploring sources p\,t,--- ,Pq,t, where piy,.. -^Pq,! are the probabilities 
of yielding a relevant item for sources selected to explore at time t. 

In this thesis, we sample q different sources each time period, but there are other possible 
ways to sample q sources, such as sampling q items from the same source. However, the 
operational settings in which such an approach is possible are limited and, thus, omitted 
from consideration here. 

In this chapter, we explain the algorithm in detail, and we interpret the results obtained 
from the simulations. First, in Section 4.2, we look at the performance of the algorithm 
when the ps are from either a pure or mixture population. The goal in that section is to get 
a feel for the algorithm’s behavior. Does the expected regret grow logarithmically in time? 
How is the expected regret affected by prior knowledge? 
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Then, in Section 4.3, we treat ^ as a decision variable, allowing it to change over time. We 
use a normal approximation to find the capacity required to detect a certain percentage of 
the relevant items. While relatively trivial, this is another contribution of our analysis. 

In Section 4.4, we develop the notion of risk. We view risk as the expected fraction of rel¬ 
evant sources unexplored in a time period. We analyze the trade off between the resources 
allocated against the risk assumed. 

In some cases, it makes sense for the analyst to have a prior for ps that is a mixture dis¬ 
tribution, for instance, when intelligence agencies categorize population into an innocent 
group and a dangerous group, or when a satellite takes pictures in areas of interest and past 
non-interest. Hence, it is important to include such scenarios in our analysis. Thusly mo¬ 
tivated, in Section 4.5, we employ the EM algorithm in situations wherein the intelligence 
analyst has historical data that stem from a mixture distribution, and can be used for a prior 
for the parameters While this section is a bit disconnected from the other parts of 

this chapter, we view it as important, because in most realistic situations, there is past data 
available. 


4.1 Algorithm 

In this section we discuss the algorithms employed for the analysis. Because the intelli¬ 
gence organization typically pools its resources, the value of q (i.e., a measure of the re¬ 
sources devoted to the request for information under consideration) can change over time, 
but its average value is upper bounded. This relaxation motivates the following questions. 
How does q change over time, subject to a bound on its mean value? What is the associated 
risk with any given ql How should we adjust q in future time periods? These questions are 
the subject of Chapter 4. 

We assume that the probability that source s generates a relevant item {pd) comes from 
some arbitrary distribution with support over [0,1], meaning that for source s nature gets 
a sample ps, which then is used to generate the rewards from a Bernoulli distribution with 
parameter ps. The analyst updates the beta parameters as discussed in Chapter 3. From the 
analyst’s standpoint, he or she has a prior distribution for ps, which does not necessarily 
coincide with the true underlying distribution of ps. The analyst faces a number of different 
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scenarios, depending on whether there is historical data, on the assumptions he or she 
makes about the true distribution of ps and on the type of information revealed: 

1. There is no prior information about ps, so the analyst assumes a uniform distribution 
over (0,1), i.e., Beta(l,l). The ps are drawn from some arbitrary distribution, for ex¬ 
ample, a mixture of a Beta and a triangular density over [0,1], as shown in Figure 4.6 
Figure 4.7. 

2. Nature sets the distribution of ps as Beta, and the analyst knows this. If there is no 
historical data, we end as in the first scenario (this is the case of Figure 4.2). If there 
is historical data available, the analyst estimates and using maximum likelihood 
estimation. 

3. The analyst knows that nature issues a mixture of Beta distributions for for in¬ 
stance, Ps ~ .9Beta(l,5)-l-.lBeta(5,1). The analyst observes the sequence of zeros 
and ones from each source. However, the analyst does not know from which of the 
components, either Beta(l,5) or Beta(5,l) in the preceding example, the data origi¬ 
nates, nor does he or she know the mixing probabilities (.1 and .9 in the example). 
In this case, the analyst uses historical data along with the EM algorithm to estimate 
the Beta parameters (as and that appear in the algorithm of Figure 4.1, in l(b)iii) 
as well as the mixing probabilities. This scenario is discussed in Section 4.5. 

The main algorithm is shown in Figure 4.1. 
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1. Initialization: 

(a) Set t = 0, 

(b) For each source 5 , 

i. Draw ps from a some arbitrary density. 

ii. Set ys^t = 0 and ns^i = 0. 

iii. Set ctj = l,/3j = 1 (or use EM to estimate them) 

2. Draw a sample ps from each source 5 from a Beta{as + ys,i^Ps + fis,i ~ys,t)- and sort 
the values in increasing order P(i):P{2)t ■ ■ :P(s)- 

3. From q sources corresponding to ^( 5 -^+ 1 ), ■ ■ ■ ,P( 5 ), draw a sample Xs from 
Bemoulli(pi). 

4. Set ys^t = ys,i-i + 1 if .Ti = 1 for sampled sources. 

5. Set Hs.t = tis^t-i + 1 for all q sampled sources. 

6 . Set t = t +1. 

7. Go back to 2. 

Figure 4.1: Pseudocode of the Algorithm for Arbitrary Priors 


4.2 Results 

In this section we analyze the performance of the algorithm through numerical experiments. 
We consider two scenarios. In the first scenario, ps are sampled from one population (i’ = 
1,... ,5, and S is the number of sources). In the second scenario, ps are sampled from two 
populations. We show the results in terms of cumulative regret and regret per time period. 


4.2.1 When ps are Sampled from One Population 

In this subsection, we treat the simplest scenario, one in which all the parameters ps are 
independent and identically distributed from a Beta distribution. In particular, we assume 
that Ps ~ Beta(0.02,0.18). The density is shown in Figure 4.2. This assumption implies that 
sources have mostly either low or high ps values and rarely intermediate values, capturing 
situations wherein the population rarely produces relevant item, but those items that are 
relevant come from a small subset of the population. 

The regret realized in each time period is shown in Figure 4.4 and the cumulative regret 
appears in Figure 4.3, with 95% confidence interval bands obtained by 200 simulation 
replications (q = 20 and S = 100). The per-period regret decays toward zero as learning 
becomes realized, and the sources with the largest ps become more likely to be sampled. 
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Pdf for One Population 



P_s 


Figure 4.2: Pdf of the Distribution of the ps 


Regret with One Population 


Non-Cumulative Regret with One Population 




Figure 4.3: Cumulative Regret for One Popula¬ 
tion 


Figure 4.4: Non-cumulative Regret for One Pop¬ 
ulation 
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In Figure 4.5, we plot the average eumulative regret over 200 sample paths as a funetion 
of logt as well as the 95% eonfidence interval bands. The motivation is Equation 3.1: 
the expeeted eumulative regret grows at order logt. The average eumulative regret does 
not appear to grow linearly for t = exp(6) ~ 400, but it does thereafter. This is not in 
disagreement with Equation 3.1, as it only applies as t grows larger. We believe this is 
because we sampled q = 20 sources per period while Equation 3.1 applies to the case in 
which q = 1. In other words, we accrue regret over the poor sources that get sampled 
among the 20 selected sources per time period. 


Regret vs. Log(t) with One Population 



Figure 4.5: Regret for One Population 
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4.2.2 When ps are Sampled from Two Populations 

In this subsection, we consider a mixture distribution for the priors, with known mixing 
probabilities. This will be relaxed later on, when we use the EM algorithm to estimate the 
mixing probabilities. This would be appropriate for situations in which a subpopulation 
group is viewed as a source, with the mixing probability representing the weight of the 
subpopulation in the broader population. In particular, we assume that the probability of 
producing a relevant item {ps) for 99% of the sources are from Beta(0.05, 0.95) distribution 
and for 1% of the sources from Triangular(0,l,l) distribution. 

The initial prior is Beta(l,l), but as the number of rewards observed grows larger, its impact 
becomes relatively smaller and the updated values of a and j8 (c.f.. Line 2 of the algorithm 
in Figure 4.1) eventually force the algorithm to emphasize exploring from the best sources. 

The densities of both distributions are shown in Figure 4.6 and Figure 4.7. This assumption 
is reasonable for screening communication items because most of the people are innocent 
and have a low probability of generating a relevant item (e.g., e-mail or phone conversa¬ 
tion), while a very small percentage of people (e.g., criminals or terrorist suspects) have a 
higher probability relevant items. Regret in each time period is shown in Figure 4.9, and 
cumulative regret is shown in Figure 4.8. As seen in the figures, the mixture of source 
populations has created a similar regret pattern to the one population scenario. 
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Pdf 1 


Pdf 2 




Figure 4.6: Pdf 1 


Figure 4.7: Pdf 2 


Regret with TWo Populations 


Non-Cumulative Regret with TWo Populations 




T Time 


Figure 4.8: Cumulative Regret for Two (Mix) 
Populations 


Figure 4.9: Non-cumulative Regret for Two 
(Mix) Populations 
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Regret vs. Log(t) with TWo Populations 



Figure 4.10: Regret for Two Populations 

As shown in Figure 4.10, we plot the average eumulative regret in terms logt, ineluding 
95% eonfidenee bands, based on 200 replieations, for q = 20 sourees sampled per time 
period. As with Figure 4.5, the expeeted regret appears to grow linearly only for values of 
t larger than ~ 60. In this ease, we have the extra differenee, relative to Equation 3.1, 
that the prior distribution of the sourees is a mixture of a beta and a triangular density but 
unknown and initialized as Beta(l,l) in the simulation. 

4.3 Determining the Number of Sources to Sample {q) 

An output of a single replieation of the simulation for 5 = 100 sourees during whieh the 
analyst samples ^ = 10 sourees per time period is shown in Figure 4.11 {ps are sampled 
as deseribed in Subseetion 4.2.2). We observe that after an exploration period, the regret 
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per time period stabilizes under the red curve, which shows the real number of relevant 
items throughout the time periods. The blue line, represents the number of relevant items 
discovered. 

Clearly, the larger the number of sources explored per time period means the larger the 
expected number of relevant items discovered. Accordingly, the blue line approaches the 
red line when q is increased from 10 (Figure 4.11) to 20 (Figure 4.12). Another observation 
is that the increase in q almost cuts in half the number of time periods required for the curve 
to stabilize (from about 100 to 50). 

This raises the question: How does the expected regret change as a function of the number 
of sources explored per time period {q)l Our goal in this subsection is address this issue. 

In Figure 4.13, we show the regrets obtained for different q values (^ = 1,10,20,40,60,80). 
After the exploration phase, the regret grows like a constant C times the logarithm of T, 
wherein the proportionality constant depends on q. As shown in Equation 3.1, the growth’s 
constant depends on how similar the best source is to the other sources when q=\. How¬ 
ever, when q takes on other values, we conjecture that it depends on how similar the best 
q sources are to the rest of the sources. We observe that the growth constant for ^ = 1 is 
greater than that for q = 10. The rationale for this is that we sample q> \ sources per 
period, so that there is extra regret due to the poor sources that get sampled among the q 
selected. 
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Time Time 


Figure 4.11: ^ = 10 


Figure 4.12: q = 20 


Let Yt be the number of relevant items in period t across all sources. The posterior dis¬ 
tribution of Yj given the exploration to date is a sum of S independent Bernoulli ran¬ 
dom variables, Y.s^s,t- Each Xs^t is Bernoulli with parameter for 

(Xs,t = (Xs+ys,t and Psq = l^s + fisq —ys,t- The interpretation, as in Chapter 3, is that as,t is 
the prior parameter plus the number of relevant items explored from source s to-date, 
and is the initial /3j plus the number of irrelevant items in source s in the initial t periods. 
Therefore, 


£’[171 exploration in periods t = l,...,t — 1] = ^-, (4.1) 

J=1 ^s,t + Ps,t 
S 

Var(yf I exploration in periods t = 1,... — 1) = ^ Var(Xj^f). 

S=\ 
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Figure 4.13: Regret Obtained by Different q Values 


Since psq are also random, we may employ the total varianee formula for Var(X^^f), 


Var(X,,,) = E[W2s{X,^t\Ps,t)]+y^{E%,t\psA) 
= E[{Psq){^ -Ps,t)\ +Var(p,,f) 

^S,tPs,t 




= E[ps,t]-E[pi 


(OCsq + l^s,t)^iOCs,t + Ps,t + 1) 

(^s,ii(^s,t “b 1) 


■ + ■ 


(the Ps/s are Beta distributed) 


OCs,t-\-Ps,t {OCs^t-\-Ps,t){OCs,t + Ps,t-\-^) {OCs,t + Ps,t)^iOCs^t-\-Ps,t-\-^) 

We conclude that 


Var (y, I exploration in periods t = l,...,t — 1) 

_ (^s,t “F 1) ^ 

1 ^S,t + Ps,t (OCsq + Ps,t){OCs,t + Ps,t + 


(^S,tPs,t 

{CCs,i + lis,t)^{CCs,t + /3j,r + 1) 


(4.2) 
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Observe that the posterior varianee of the total number of relevant items by period t de- 
eays toward the varianee of a sum of Bernoulli random variables, eonsidered the systemie 
varianee, as exploration eliminates the varianee due to the uncertainty about the p^’s. 

The central limit theorem suggests that the total number of relevant items at time t given 
by Yt is approximately normally distributed when the number of sources is large. This 
motivates the study of two different scenarios. In the first case, we assume that the number 
of sources to sample has been decided upfront and must remain constant thereafter. In the 
second scenario, the analyst can change the number of sources to sample dynamically. In 
both cases, the goal of the analyst is to sample as many sources as needed, so he or she 
has a 95% probability of capturing the reward of a source; that is, on average, the analyst 
collects 95% of the total rewards. 

Thus, we use a normal approximation to provide an upper bound with c confidence: 


Upper Bound = £ [Tq] + ^ ^ (c) CTrg, (4.3) 

where £'[To] is the sum of all the prior means, and Oyq is the sum of the standard deviation 
of all the priors at time zero. 

Using these equations, we are able to provide an upper bound for Yt with confidence level 
c. Then, it will be reasonable for q to be greater than or equal to this bound. Here, c can be 
regarded as another decision variable. 

In order to capture all relevant items after the exploration phase, the allocated capacity 
has to be greater than or equal to the number of relevant items in each iteration {q> Yt). 
Because Yt is a random variable, it is safe to choose a value for q that is greater than or 
equal to the value obtained by the upper bound in Equation 4.3. 

Corresponding upper bounds are shown in Table 4.1. In other words, a minimum num¬ 
ber of sources explored, q, for certain number of sources S, with 0.99 confidence level 
(c). Although these bounds are for prior Beta(0.02,0.18), providing similar bounds for any 
distribution of the ps seems straightforward. 
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Table 4.1: Minimum q Values Required 
S q 


10 

4 

20 

6 

50 

10 

100 

17 

500 

66 

1000 

123 

10000 

1070 


If the analyst must choose the value of q in advance, he or she should follow the recom¬ 
mendation above. On the other hand, since Us and /3j do change over time as relevant and 
non-relevant items are examined, the analyst may change the value of q dynamically. 

Next, we analyze the second scenario, wherein the analyst can adjust the number of sources 
explored {q) dynamically according to the posteriors. If the analyst has a capacity that is at 
least as large as the number of sources, then there is no risk of missing any relevant items. 
However, some capacity may become idle or useless as the sources are explored over time. 
Hence, instead of allocating a constant capacity for all time periods, it is more efficient to 
adjust the capacity q as learning occurs. 

The idea is similar to the first scenario, namely, to use a normal approximation and to com¬ 
pute the percentile of the number of relevant items using the posterior mean and variances, 
as in Equations 4.2 and 4.1. More precisely, we determine the value of the upper bound: 

Upper Bound 

= £’[7; I exploration mt = 1,... — 1] -l-4>^^(c)A/Var(yf|exploration mt = 1,... — 1), 

(4.4) 

where E [7,|exploration in periods t = l,...,t — 1] and Var(7f|exploration int = l,...,t — 1) 
are as defined above. Notably, at time t, the posterior parameters for source s are the initial 
tts plus the number of relevant items detected to date and plus the number of non-relevant 
items up to time t. 
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We summarize the algorithm when the eapaeity q is seleeted dynamieally, as shown in Fig¬ 
ure 4.16. We use the posterior mean plus two standard deviations {E\Yt\ + 2 x ay,) to update 
q. In simulation, we used Beta(0.02,0.18) to sample Ps- However, assuming no knowledge, 
we initialized the prior distributions for eaeh source as Beta(l,l). Output, for one replica¬ 
tion of the simulation, is shown in Figure 4.14, and the corresponding change in q is shown 
in Figure 4.15. The red line represents the actual number of relevant items generated by 
all sources throughout the time horizon, and the blue line represents the number of relevant 
items discovered by the algorithm. It can be observed that with dynamic q, the algorithm 
captures almost all of the relevant items, even in the exploration phase of the algorithm. In 
this scenario, the capacity q is sufficient to explore almost all of the expected (with respect 
to the posterior distribution) relevant items. 


12 - 
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Figure 4.14: Learning with Dynamic q 


37 

















Figure 4.15: Change in q 

Regarding the behavior of the eapaeity q in Figure 4.15, its value, as determined by Equa¬ 
tion 4.4, deeays as the uneertainty about the values of the ps probabilities are revealed. As 
mentioned above, as time inereases the posterior varianee of Yt eonverges to the sum of S 
Bernoulli varianees. The eloser the initial prior is to the true ps values means the shorter 
the time until the eapaeity q stabilizes. We view the uniform initial prior as a worst-ease 
seenario absent any “wrong" knowledge. 
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1. Initialization: 

(a) Set a and b. 

(b) Set c 

(c) Set t = 0, 

(d) For eaeh souree 5 , 

i. Draw ps from a Beta(a, b). 

ii. Set ys^t = 0 and ris^j = 0. 

iii. Set tts = IjPs — I (if there are initial information about any souree set 
aeeordingly) 

2. Draw a sample ps from eaeh souree 5 from a Beta(as+ys,i:lis + ns,i —ys,t)- and sort 
the values in inereasing order P[\),P{2)^ ■ ■ ■ ^P{s)- 

3. From q sourees eorresponding to P[s-q+\)^---^P{S)^ draw a sample from 
Bemouilli(pj). 

4. Set ys^t = yj,r-i + 1 if = 1 for sampled sourees. 

5. Set Hsj = + 1 for all q sampled sources. 

6. Calculate the Mean and Standard deviation of Yt according to the posterior dis¬ 
tribution at time t 

1 . Set^ = £[yr]+d>-l(c)C7y, 

8. Sett = t+ 1. 

9. Go back to 2. 

Figure 4.16: Pseudocode of the Dynamic Algorithm 


4.4 Risk vs. Resource Allocated 

Having analyzed how the capacity q should change over time, in this subsection, we inspect 
the effect of q on the risk of missing relevant items. In order to do so, we first define a metric 
to measure the risk. We define the risk in period t as the conditional expectation: 


E [fraction of relevant items not explored in period t\Yt\^ 

where Yt is the total number of relevant items in period t. For example, if nature sets the 
total number of relevant items at time t equal to 80, and the expected number relevant items 
explored by the algorithm equals 60 conditioned on the 80 relevant items, then the risk is 
25%. 

The expression above is difficult to compute analytically because it depends on the q 
sources selected by the algorithm in period t, and Yt is unknown. This is the reason for 


39 




using Thompson sampling—it randomizes the selection of the sources to sample—by se¬ 
lecting the largest q samples, each drawn from the posterior distribution of ps- 


The average risk over the time horizon t = 1,..., T is the grand average over the time 
horizon T of the risk in each period: 


Risk(r) 


1 

T 


T 

[fraction of relevant items not explored in period t\Yt\. 

t=\ 


The average risk is random, because it depends on the total number of relevant items in 
each period, which are random and not known. The expected risk essentially un¬ 

conditions the number of relevant items in each period and can be estimated by Monte 
Carlo simulation. 


We provide a numerical example. For this purpose, after sampling ps once as described 
in Subsection 4.2.2, we run the algorithm to learn about p^ 30 replications. The relation 
between the capacity q and the expected risk obtained is shown in Table 4.2 and Figure 4.17. 
In the latter, a 95% confidence interval appears in light red, centered around the sample 
average. 
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Figure 4.17: Tradeoff between Allocated Resource per Time Period {q) and the Risk 

In Figure 4.18, we observe that the risk depends on both the resouree alloeated and the 
time horizon. Although we ean estimate risk after truneating the initial learning period, we 
do not do this to penalize long learning periods. If the model is to be used for a long T, 
then the effeet of initial exploration period naturally tends to zero. Otherwise, short time 
horizons foree the analyst to ehoose a greater q to obtain the same risk level eomparing to 
the longer T. 
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Risk(Q) 


1.00 



Figure 4.18: Risk according to q and Time Florizon 


Table 4.2: Estimated Risk for Allocated Resources [q) (S = 100 and T = 300) 


q 

Risk 

q 

Risk 

q 

Risk 

q 

Risk 

10 

0.33, 

20 

0.137, 

30 

0.06, 

40 

0.034, 

11 

0.299, 

21 

0.128, 

31 

0.06, 

41 

0.032, 

12 

0.269, 

22 

0.118, 

32 

0.054, 

42 

0.03, 

13 

0.239, 

23 

0.109, 

33 

0.051, 

43 

0.029, 

14 

0.223, 

24 

0.101, 

34 

0.048, 

44 

0.027, 

15 

0.201, 

25 

0.096, 

35 

0.047, 

45 

0.025, 

16 

0.186, 

26 

0.087, 

36 

0.044, 

46 

0.024, 

17 

0.174, 

27 

0.08, 

37 

0.04, 

47 

0.024, 

18 

0.164, 

28 

0.076, 

38 

0.039, 

48 

0.022, 

19 

0.153, 

29 

0.072, 

39 

0.034, 

49 

0.021 
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4.5 Using Posteriors to Learn about the Distribution of ps 

As previously mentioned, it is common to be in situations with historical data for relevant 
and non-relevant items, whereby each ps is sampled from a mixture distribution. The 
issue for the analyst is that the membership of a ps to either one of the two components 
is unobservable, but the zeros and ones can be ascribed to a particular source. It is in 
situations such as these that the EM algorithm is applicable. In this section, we provide two 
numerical examples for the method. The main idea is to treat the means of posterior beta 
distributions as if they are samples from the unknown distribution of ps. 

For simulation, we assume that p^ come from a mixture distribution, 

0.1 X 5eta(l,5) +0.9 x Beta{5, 1) 
and generated ps for 100 sources. We set q = 20 and T = 2000. 

True density and sampled ps are shown in Figure 4.19. The fitted density and mean poste¬ 
riors are shown in Figure 4.20. Parameters estimated by the algorithm are shown in Table 
4.3. Since we gave comparatively small number of sources (10% expected out of 100) from 
the first component, Beta(l,5), it was very hard to estimate its true parameters and weight. 
However, its effect on the fitted distribution is low compared to component 1. When we 
compare Figure 4.19 and Figure 4.20, we can conclude that the EM algorithm generated a 
density that is similar to the true one. 

Another assumption is that 

0.25 X Beta(l,5) +0.75 x Beta{5, 1) 

and generated ps for 1000 sources. We set q = 200. The resulting fitted densities of the 
mixture population for certain time points {T = 2000,4000,6000,8000) are shown in Fig¬ 
ure 4.21. We observe that the fitted density approaches to the true density (also shown in 
the figure) as data becomes available. 
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True distribution of p_s 


Means of posteriors 



Probability Probability 


Figure 4.19: Assumed (True) Distribution and Figure 4.20: Fitted Distribution and Histogram 

Histogram of 100 Samples of 100 Means of Posteriors 


Table 4.3: Comparison of the True Weights and Parameters to Those Fitted 



True 

Estimated 

Component 1 

Alpha 

1 

4.9 

Beta 

5 

28.8 

Mean 

0.16 

0.14 

Weight 

0.9 

0.63 

Component 2 

Alpha 

5 

1.65 

Beta 

1 

2.06 

Mean 

0.84 

0.44 

Weight 

0.1 

0.37 
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Figure 4.21: Fitted Mixture Distributions at Different Time Periods 
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These two numerical illustrations suggest that the EM algorithm is useful in scenarios 
wherein nature generates relevant and non-relevant items from sources that have a param¬ 
eter ps that itself is sampled from a mixture distribution. 
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CHAPTER 5: 

Conclusion and Further Study 


In this chapter, we summarize the conclusions drawn from the analysis and propose some 
suggestions and scenarios for future research. 


5.1 Conclusion 

In this thesis, we focus on the problem of efficiently processing the vast amount of data 
handled within the intelligence cycle. We propose a learning model that can be used ef¬ 
ficiently to allocate the efforts available to the sources that generate data. We suggest a 
method that can be used to dynamically adapt the amount of effort that is allocated as data 
becomes available. 

Our main conclusions and contributions can be summarized as follows: 

• The model described can be used to allocate the resources/efforts for collecting/pro¬ 
cessing efficiently. 

• The suggested algorithm employed in the model yields a sublinear performance in 
the simulations we conducted, meaning that the average regret tends to zero as the 
number of time periods increase. 

• The model performs well when the ps are from either a pure or a mixture population. 

• The model can be adapted to situations in which there exists prior knowledge about 
the sources. 

• We consider the number of sources chosen/capacity as possibly changing over time as 
information becomes available. With this approach, intelligence agencies can better 
control the regret in the exploration phase and avoid using excess capacity as the ps 
values are better estimated. 

• The model can also be employed to gain insights about the risk, which provides 
further guidance for the capacity required. 

• We also use the EM algorithm to estimate the distributional parameters for the can¬ 
didate subpopulations as the data is collected. 
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5.2 Further Study 

Further studies can be conducted by relaxing the assumptions and settings we established 
for our model and methods. 

First, the number of sources can be permitted to change, as some sources leave and new 
ones come. As an example of relaxing this assumption, one might consider translating a 
number of Twitter messages. Here, the Twitter accounts are the sources, and the messages 
are the items. Some of the sources may be inactive for some period of time; there may also 
be new accounts to look into or others that close. 

Second, the ps probabilities may be permitted to change in time. Third, we believe the most 
challenging aspect is to capture the dependencies between the sources and the item values 
over time. We intentionally did not specify what the source and the collection/processing 
asset are. Focusing on a particular asset would determine the setting and assumption. 
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